Talk overview

  • Introduction
    • Open Science and version control in archaeology
    • Git and limitation
  • Git for geospatial data
    • Description and features of Kart
  • Practical applications of kart in archaeology
  • Thoughts and conclusions

Scan and follow the presentation on your phone!

Introduction

1 2 3 4

Open Science and transparency of the process

1 2 3 4

  • One (of many) aim Open Science: opening and transparency of process behind data creation and results
  • “Data must have history” Strupler and Wilkinson (2017)

Wallis (2022)

Version control

1 2 3 4

  • Transparent process trough “snapshots” at different stages (roll-back if necessary)
  • Provides a solution to the multiple iterations of correction and renaming of the same file
  • Greater accountability and better documentation (Kansa 2012)
  • Enhances Open Science practices (Marwick 2017)

Source: xkcd

Git

1 2 3 4

  • Distributed version control system
  • Originally developed to track changes in the linux kernel
  • Adapted also to non-programming applications
  • Git is still not a user friendly software
  • Graphical frontends do not always help

Source: xkcd

Distributed version control and archaeology

1 2 3 4

  • Archaeology has come a long way in adopting version control
  • Applied mainly in the programming/scripting applications and manuscripts/publications
  • Some attempts to adapt it to fieldwork practices


Source: Strupler and Wilkinson (2017: 5)

Source: Strupler and Wilkinson (2017: 4)

Git and binary files

1 2 3 4

  • Binary files: images, word documents, excel files
  • Git is not as efficent with binary files as it is with plain text (save the entire file every time)
  • Storage issues, harder to track changes
  • For text files, plain text can sometimes be the answer, but what about GIS and relational databases?

Example diff of plain text file with additions visibile in green

Example diff of binary file, no change visible

What about geospatial data?

1 2 3 4

  • In GIS, research process is often obscured by the point-and-click nature of the GUI
  • QGIS models can surely help reproducibility of some analyses
  • Scripts for data cleaning

For many in archaeology, for whom using GIS to visualise results is essentially a graphical-based point and-click process, advocating a return to code may seem like a backward step. We understand the arguments for usability, and acknowledge that intermediate tools which can bridge point-and-click with code-based approaches are desperately required.

Strupler and Wilkinson (2017)

Git for geospatial data

1 2 3 4

Git for distributed version control of geospatial data

1 2 3 4


Kart features

1 2 3 4

  • Works with different file formats: Geopackage, PostgreSQL/PostGIS, MySQL, MSSQLS
  • Support most geospatial data types: Vectors, Raster, Point Clouds, Lidar, etc.
  • Planned support for shapefiles
  • “Built on git, works like git”
  • Ships with its own version of git and git large file storage
  • No need to have git installed

Kart features

1 2 3 4

  • Track changes at the row and cell layer level
  • Command Line Interface tool
  • Standard git workflow
    • kart status
    • kart add
    • kart commit
    • kart pull
    • kart push
    • kart log
    • kart switch/branch
  • Scriptable

Kart QGIS Plugin

1 2 3 4

  • QGIS plugin offers a Graphical User Interface
  • (Almost) all the kart commands are available
  • Visual tool to inspect changes

Remote Collaboration

1 2 3 4

  • Host data in remote repositories (Github, Codeberg, Gitlab, Sourcehut, etc.)
  • Compatible with all qgis styles
  • Potential to mitigate common issues with data sharing

Kart way of storing data

1 2 3 4

  • Data are broken down into SQL-like model of tabular structure
  • Visible in the remote repository, not in the working copy (local folder)
  • The geopackage (or any format) is not present on the kart remote repo


Kart for archaeology

1 2 3 4

Kart for archaeology

1 2 3 4

  • Fieldwork (no need internet connection unless you push changes to remote)
    • Remote repository can also be another folder
  • Desk-based work
    • Collaboration inside projects
  • Uphold Open Science practices

By Ainsley Seago (2014) CC BY 4.0

Practical use of Kart - Research Project

1 2 3 4

  • Collaboration between project members
    • Simple git workflow
    • Different branches for each person, pushing and merging to main
  • Keeping track of dataset change
    • Transparency of the process
    • File (and methods) history
    • Inspect beyond the final product

Practical use of Kart - Fieldwork

1 2 3 4

  • Archaeological survey (ReLand Project)
  • Workday branch (fieldwork raw data from Qfield)
  • Main branch (refined data, end of season)
  • Push to remote repo (https and external drive) for daily backup
  • No collaboration between team members (yet)

Practical use of Kart - limitations

1 2 3 4

  • Not many issues until now (few people)
  • Collaboration tested on two MacOS (13-Ventura and 12-Monterey), issues with MacOS 11-Big Sur
  • Kart tested also on Ubuntu-based Linux (Pop!_OS)
  • Conflicts with primary keys when working with Geopackages

Spreading the word - Wiki

1 2 3 4

  • Public project wiki
  • How to use the dataset and how to use kart
  • Tips to solve common issues
  • Methodology and convetions
  • Internal use and external reference
  • Updated as the project proceed

Spreading the word - Kart Tutorial

1 2 3 4

  • Step-by-step tutorial for installing and basic use of Kart for archaeology
  • Part of (Titolo and Palmisano in press)
  • Sample dataset provided
  • Build with Quarto (reproducible)

Conclusions

1 2 3 4

Conclusions

1 2 3 4

Advantages

  • Git-based tool + Graphical solution for those unfamiliar with git
  • Fieldwork (no internet connection needed unless you push changes to remote)
  • Kart can fit well into archaeological Open Science practices
  • More transparency both during and after data creation process
  • Lack single file to download from online repositories1 (site stewardship)

Disadvantages

  • Not an easily accessible tool
  • Graphical interface still need more work
  • Solving primary key conflicts requires the command-line
  • Documentation is still catching up with recent development
    • Contribution to upstream

Works Cited

Coup, R. (2022a). Kart: An introduction to practical data versioning for geospatial.
Coup, R. (2022b). Kart: A Practical Tool for Versioning Geospatial Data.
Coup, R. (2023). 2023 QGIS Data Versioning with Kart - Robert Coup.
Kansa, E. (2012). Openness and Archaeology’s Information Ecosystem. World Archaeology 44: 498–520.
KartContributors (2023). Kart geospatial data version-control software.
Marwick, B. (2017). Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation. Journal of Archaeological Method and Theory 24: 424–450.
Olaya, V. (2022). Spatial data versioning with the Kart QGIS Plugin with Victor Olaya.
Strupler, N. and Wilkinson, T. C. (2017). Reproducibility in the field: Transparency, version control and collaboration on the project panormos survey.
Titolo, A. and Palmisano, A. (in press). Using kart and GitHub for versioning and collaborating with spatial data in archaeological research. Archeologia e Calcolatori.
Wallis, K. (2022). Open Science: A practical guide for PhD students, University College London.